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The density matrix in quantum mechanics parameterizes the statistical properties of the system 
under observation, just like a classical probability distribution does for classical systems. The 
expectation value of observables cannot be measured directly, it can only be approximated by 
applying classical statistical methods to the frequencies by which certain measurement outcomes 
(clicks) are obtained. In this paper, we make a detailed study of the statistical fluctuations obtained 
during an experiment in which a hypothesis is tested, i.e. the hypothesis that a certain setup 
produces a given quantum state. Although the classical and quantum problem are very much 
related to each other, the quantum problem is much richer due to the additional optimization over 
the measurement basis. Just as in the case of classical hypothesis testing, the confidence in quantum 
\ hypothesis testing scales exponentially in the number of copies. In this paper, we will argue 1) that 

^SJ ■ the physically relevant data of quantum experiments is only contained in the frequencies of the 

measurement outcomes, and that the statistical fluctuations of the experiment are essential, so that 
^ I the correct formulation of the conclusions of a quantum experiment should be given in terms of 

^\ , hypothesis tests, 2) that the (classical) test for distinguishing two quantum states gives rise 

• to the quantum divergence when optimized over the measurement basis, 3) present a max- 

^\ ' min characterization for the optimal measurement basis for quantum goodness of fit testing, find 

CN) , the quantum measurement which leads both to the maximal Pitman and Bahadur efficiency, and 

determine the associated divergence rates. 



■ 
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PACS numbers: 



^ I The problem of quantum measurement has received a wide-ranging surge of interest because of ground-breaking 
^ i experiments in quantum information processing [H-Q . A fundamental feature of quantum measurements is the peculiar 
^H' interplay between the quantum and classical world: a quantum measurement gives rise to "classical clicks", i.e. 
I— I, individual samples, and the only information that can be obtained when observing a quantum system is contained in 
the frequencies of the possible measurement outcomes. Let us consider a quantum experiment in which we receive 
^ , a large but finite amount of identical copies of the state a. As the number of measurements that can be done is 

• obviously bounded, there is no way by which two quantum systems whose density matrices are very close to each 
other can be distinguished exactly. In other words, it is fundamentally impossible to certify that a given system is in 

• a particular quantum state cr: the only thing we can aim for is to certify that all the data collected in the experiment 

■ is compatible with the hypothesis that we sampled from the state a. 

fSj Exactly the same problem is present in classical statistics it is impossible to certify that one is sampling from 

■ a given distribution, but one can only gain confidence that the samples are compatible or not with the fact that they 
I are taken from a given distribution. Formally, the only thing achievable in a classical statistical experiment is to 

I ' accept or reject a hypothesis. In the given setting, we take as the null hypothesis the fact that the distribution that 
^ , we are sampling from has certain features, and we want to check whether the obtained data are compatible with this 
hypothesis. In practice, this means that a confidence interval has to be defined in which the hypothesis is accepted or 
^ . , rejected; for example, in the case of a zero mean normally distributed random variable with standard deviation 1, the 

■ confidence interval corresponding to 95% confidence would be [—2, 2]. The hypothesis is rejected when the experiment 
yields an outcome that was outside of this confidence interval, and accepted otherwise. Note that acceptance of the 
hypothesis does not imply that the hypothesis is true, it only indicated that the observed data are compatible with 
the hypothesis. 

Such a framework for hypothesis testing was developed one century ago by Pearson and Fisher 0, Q , and forms the 
backbone for many more advanced techniques. One of the most successful tests is the so-called test. Its success has 
to do with the fact that it is universal ^ : the confidence intervals that can be defined are independent of the details 
of the distribution corresponding to the null hypothesis, as only the number of degrees of freedom plays a role. Also, 
the test is in practice already applicable when relatively few samples are taken. The test essentially measures 
the fluctuations around the expected frequencies of the possible outcomes: if those fluctuations are too small or too 
large, the hypothesis is rejected. 

Fluctuations obviously also play a central role in quantum measurements. The expectation value of an observable 
is not something that can be measured, it can only be sampled, and we get an increasingly better precision the more 
measurements are being done. This actually means that the expectation value of an observable is not physical: only 
the individual samples (clicks) are physical. Expectation values can only be approximated using the frequencies of 
the different outcomes. 
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As a consequence, quantum mechanics should be reformulated in terms of observable quantities, i.e. clicks, and 
expectation values of a quantum observable are certainly not observable. For example, the Heisenberg uncertainty 
relation is formulated in terms of an expectation value and therefore not physical; it has to be reformulated in terms 
of clicks such as to get an operational meaning. 

The topic of this paper is to make a detailed analysis of how the hypothesis test, when applied to the frequencies 
obtained from quantum measurements, reveals information about the underlying quantum states. A particular com- 
plication in the quantum setting that makes the problem much richer is the fact that we have the additional choice 
of the basis in which the measurements are done. The specific questions that we will address are: 

1. How to set up the test in the quantum setting; how many degrees of freedom does the test have? 

2. Suppose that we want to gain confidence that we prepared a certain quantum state a in the lab; what is the 
optimal POVM measurement such that, for all states for which \\p — a\\ > e, we would reject the hypothesis 
with the least amount of measurements if the state were p instead of a 

3. What is the associated divergence rate for rejecting a false hypothesis? 

4. What is the relationship between the classical distance defined on measured frequencies versus the quantum 

distance? 

This paper fits into a long series of papers that were concerned with quantum parameter estimation and quantum 
hypothesis testing. A wealth of results has been reported in the seminal books of Helstrom [l^ and Holevo 
in a series of papers of Wootters 12 1 and other pioneers of the field of quantum information theory [l^ The 



more recent developments are covered in the books of Hayashi [15| and Petz [161] . Very recently, breakthroughs were 
obtained in defining confidence intervals in the context of quantum tomography and testing of fidelity [17h20| | . The 
present paper develops similar ideas in the context of hypothesis testing. 

hypothesis testing is fundamentally different than the Neyman-Pearson test as usually discussed in quantum 
hypothesis testing As opposed to Neyman-Pearson tests, the x^ test is perfectly well defined without a need 

of formulating an alternative hypothesis. Such a situation arises precisely when we want to test whether a certain 
quantum state has been created in the lab. We will also focus on separable measurements, i.e. individual measurements 
on individual samples, as opposed to entangled measurements such as typically considered in Neyman-Pearson tests 
[2l| . Therefore, the analysis presented here can immediately be used in current experiments. 



I. THE TEST FOR QUANTUM MEASUREMENTS 



Let us assume that we have an experimental quantum apparatus that supposedly spits out quantum states char- 
acterized by the density matrix a. We would like to gain confidence that this hypothesis is true by performing 
measurements on it. The most general measurement strategy would correspond to the case where different positive 
operator valued measurements (POVM) Ea^i are chosen, with Ea^i > 0, J^a^i -^a.i — 1, and where the POVM 
{E,i} is measured a predetermined ki times (the fact that ki is not a random number is important for the deter- 
mination of the number of degrees of freedom in the x^ test). Such a typical setup for qubits would correspond to 
choosing rii = n/3 with n the total number of measurement done, and von- Neumann measurements in the bases 
ax — Ei^i — E2.1, cTj/ — Ei,2 — £'2,2, and az = -Ei.a — £'2,3 respectively. Alternatively, one could choose ki = n and 
do n measurements with an informational complete POVM. We will henceforth consider the situation where only 1 
POVM is used to do the measurements; we will discuss how to modify the results in the case that different POVM's 
are chosen deterministically, but the results essentially remain the same. 

We denote our /lypothesis H by the fact that the n samples we have obtained originate from doing quantum 
measurements on identical copies of the quantum state a. The measurement can be described by a POVM with r 
elements {£'i}i=i...r which obey J^i^i — 1 and where all the individual elements of the POVM are positive semi- 
definite Ei > 0. We say that the measurement has r possible outcomes labeled by i and associate a probability pi 
to each outcome which is given due to Born's rule by pi = Tr[Eia]. If we record the number of times rii that we 
have obtained some outcome i, then we can construct the empirical distribution fi = rii/n for the total number n 
samples. By the law of large numbers Q, we expect that as n — >■ 00 the empirical distribution converges to fi Pi- 
However, in any realistic scenario, we can only draw a finite number of samples. Due to the inherent randomness of 
the quantum measurements, there will be fluctuations. 
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A. The distribution 

We are now confronted with the problem of accepting or rejecting the hypothsis H in hght of only finitely many 
samples. Since we cannot be certain that the hypothesis H is false, we seek to give bounds on the error probabilty of 
rejecting the hypothesis. This is exactly the scenario of the classical x^-test (see e.g. [§]). The collection frequencies 
{ni} are distributed according to the multinomial distribution 



P(ni, ....rir) 



Pi ■ ■ -Pr , 



(1) 



where n = ni + n2 + . . . + Ur- The distributions of the individual rii can be computed as the marginals and are 
distributed according to the binomial 



(2) 



In the asymptotic limit, i.e. for large values of n, the multinomial distribution converges to the normal distribution 



P(ni, . . . , Tir) ^ exp 



- E 



npi 



(3) 



This suggest that the random variable 



i=l 



npt 



(4) 



is a good measure for testing whether we are sampling from {pi}, as it measures the deviation of the empirical 
distribution fi from the ideal distribution pi . This random variable indeed forms the basis for the celebrated 
- test, originally introduced by Pearson (8j. x^ is obviously a positive random variable. A crucial property of this 
random variable is the fact that its expectation value is independent of n and is equal to r — 1 if the samples are 
indeed drawn from the distribution {pi}. This follows directly from the fact that the individual random variables Ui 
are distributed according to the binomial distribution and hence £{ni — npi)'^ = npi{l — Pi); it then follows that 



E — = )^{l~P^) = r-l. 



4=1 



Pin 



Similarly, the variance of the random variable x^ is given by 



-2r + 2 



which also converges fast to a finite value. In practice, when Vi : npi > 5, statisticians use the following asymptotic 
form of the distribution for the x^ variable: 



Pr-lix) 



X 2 exp 



(-1) 



(5) 
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For obvious reasons, this distribution is called the ^^-distribution, and is also the distribution which is obtained by 
summing up r — 1 squares of random variables distributed following the normal distribution with expectation value 
and variance 1. Note that this distribution does neither depend on the original distribution {pi}, nor on the total 
number of measurements, but only on the number of possible independent measurement outcomes r — 1. This total 
number of degrees of freedom is equal to the number of independent rii that have to be specified. For example, in the 
case of a POVM with 4 elements, r — 4, but there is the constraint that Ui = n, and we hence have 3 degrees of 
freedom. In the case of the independent <7x, f y, cr^ measurements, there are 6 frequencies n^, but only 3 of them are 
independent, and hence we again have only 3 degrees of freedom. 

Pr-lix) 



0.25 
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Figure 1: The Figure shows the plot of the x^-distribution, i.e. Eqn. ([5|, for r — i (red sohd line) and r = 9 (grey dashed 
line) respectively. The error probability a, as given in the test protocol in Section [Til can be computed as the integral 
a — Pr~^i{x) dx and is indicated by the shaded area from Xa to oo underneath the right tail for the r = 9 distribution. 
Note that the area underneath the tail decays exponentially fast. 



For our purposes, the most important feature of the distribution is that the tails of this distribution decay 
exponentially fast. The area under the right and left tail are given by the upper and lower incomplete gamma 
functions. It is interesting to note that the weight under the tail of this distribution from — x is proportional to 
2;(''-i)/2 fQj. sniall X. This means that, for large enough degrees of freedom, it is possible to reject a hypothesis because 
there are not enough fluctuations around the expected values: frequencies that match the expected frequencies too 
well are highly unlikely, and an experiment reporting such values should be discredited! 

B. The divergence 

Let us now study what will happen when the samples are not drawn from the quantum state a but from the state p. 
Then the measurement outcomes will not be distributed according to pi = Tr [_Eicr] but according to the distribution 
Qi = Tr[Eip]. The expectation value of becomes 
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^ (n, -npif 



np. 

■n?qj + ng,(l - g,) 

\2 




The expectation value of grows linearly with the number of samples, and the multiplicative factor to this linear 
divergence is defined as the x^-divergence 

Note that the divergence also follows naturally from a different divergence, the KuUback-Leibler divergence, in 
the limit where the 2 distributions are close to each other: 

5^paog ^ = i E ^^^^ + 0{\\p - qf) (10) 



C. The quantum x divergence 

In the paper [221 . a family of quantum versions of the x^-divergence was introduced to study the convergence and 
relaxation rates [23| of completely positive maps and general dissipative quantum systems. All members of this class 
of quantum x^-divergences reduce to the classical x^ divergence when p and a commute. The formulation of the 
quantum versions of the x^-divergence follows from the framework of monotone Riemannian metrics p3 - l29| and can 
be seen as a special case of this family of metrics. It follows from the analysis of monotone Riemannian metrics that 
the family of x^-divergences has a partial order with a smallest and largest element. A special role was played by the 
Bures x^ divergence [13, HH, as it is always the smallest one of those quantum divergencies. It is defined as 

xU<7,p)^TT[{p~a)n,{p~a)] (11) 
with rig- the superoperator whose inverse if given by 

n-.HX)=^-^^ (12) 

Let us now show that an operational meaning can be given to this quantity by comparing it to the classical x^ 
divergence maximized over all possible quantum measurements. 

Lemma 1 For two states a and p we denote the probability distributions pi — Tr[£'i(T] and qi = Tr[£'ip] for some 
POVM {Ei}i^i r- Then, the Bures ^(^g- divergence is equal to the maximum value o/x^(p, <z) when optimized over all 
possible POVM measurements: 

Xb{p^ cr) = maxx^(p, q), (13) 

{Ei} 



Furthermore, the measurement maximizing this x^ divergence is a projective von-Neumann measurement in the 
eigenbasis of ilaif) =Yl,i^i\ '>Pi) {'>Pi I- 
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After completion of this work, it was brought to our attention that this theorem first appeared in a paper by 
Braunstein and Caves about the geometry of quantum states [l3| . The proof presented here has no similarity to this 
original proof, and we present this new proof here because the tools used will turn out to be relevant for the later 
sections; a central role is played by Woodberry's matrix identity (32j . 

Proof: Let us first prove that Bures divergence forms an upper bound to the divergence with respect to 
any POVM {Ei\. We write for any operator A S M£)(C) on the Hilbert space C'^ its vectorization as | A) = 
Aig) 1 1 /), where | /) = X]fc=i I denotes the unnormalized maximally entangled state. We use this notation because 
superoperators become simple matrices in this representation: 

fif = ^ (14) 



It is easy to see that the divergence is given by 



and the Bures divergence by 



xli'J,p)^{p\ z (16) 



It is therefore enough to prove the semidefinite matrix inequality 

E^^>o (17) 

a<g)t + l<g)a'^{Ei\a)~ 

for all possible POVM's {Ei}. A matrix is positive if and only if its inverse is positive, and the inverse can easily 
be calculated by making use of Woodberry's identity 

{A - UCU'^)-^ = A-^ + A-^U (C-i - C/U-if/)"^ J/U-i. (18) 

Equation (|17p is exactly of that form by choosing an orthonormal basis \i) with a number of elements equal to the 
total number of POVM elements and 



A = (19) 

o- 1 + 1 (g) o- 

U = J2\E,){^\ (20) 

i 

c^y^Hii. (21) 

^ Tr \E.,a] ^ ' 

As the matrix A is obviously positive, (fT7|) will hold if 

C-i - U^A-^U = Tr[^^-] I I - E I 1 I "^"^s"^^^ I E,) > (22) 

i ij 

or equivalently if the matrix L — Lij \ i) {j \ , with the entries 

_ j Tr[E,{l-E,)a] i=j 
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is positive semidefinite. This is indeed true, as —L is the generator of a Markovian semi-group as occurring in 
master equations (the elements in all columns sum up to zero, and all off-diagonal elements are positive), which is 
well known to have only negative eigenvalues. 

Note that we can make L equal to zero by choosing all POVM elements orthogonal to each other, i.e. by choosing 
a von Neumann measurement. The null space of the matrix occurring in (jl7p can now easily be seen to be spanned 
by the vectors in A~^U. p will therefore be in the null space and saturate the inequality iff there exist numbers {A^} 
for which 

\p) = Y.Ki^\A-^U\^)^Y.^.'-^l±l^\E.). (23) 

i i 

By writing Ei =^ \ ipi) {tpi |, this equation is equivalent to 



^A.|V^.)(V^.|=17f(p) (24) 

i 

which shows that a von Neumann measurement in the eigenbasis of ft^ (p) will give equality. □ 

This shows that the quantum divergences have indeed an operational meaning. It also illustrates the fact that 
the problem of quantum hypothesis testing is much richer than the classical one: we have the extra choice optimization 
over the measurement basis. 



II. GOODNESS OF FIT FOR QUANTUM MEASUREMENTS 



We now come to the central part the paper, which is concerned with the problem of testing whether the data acquired 
during an experiment is compatible with the fact that it is sampled from a given quantum state a. Obviously, if we 
would like to make the measurement which reveals the most information, it should be the one that would allow to 
reject the hypothesis as soon as possible if the hypothesis is false. 

We therefore define an e-ball around our hypothesis state a, and will optimize over all possible POVM measurements 
in such a way that we require that the (classical!) divergence with respect to all possible density matrices p outside 
of this ball \\p ~ a\\ > e is as large as possible. Due to the quadratic nature of the divergence, the natural norm 
to use is the Frobenius norm (i.e. ||X|| = ^Tr[XtX]); all bounds derived for the Frobenius norm can however be 
converted to any other norm such as the infinity or trace distance by using well known inequalities. 

Clearly, the optimal POVM should be an informationally complete POVM, as otherwise there would always be 
directions in which the divergence is zero. The properties of the optimal POVM will be discussed in the following 
section IIIII 

The aforementioned discussion leads us to define the following quantity 
Definition 2 The divergence rate ^ for the quantum goodness of fit test for the state a is given by 



f(^) = 4vi^?„ X^iP,<l), (25) 



where we have defined the classical x'^ -divergence 




X'(p,q)={y^^-l\, (26) 



with respect to the induced probability distributions pi — Tr[i?icr] and Qi = Tr[£'ip]. The optimization is performed 
over all possible POVM {£'i}i=i...r- and states p for which ||p — cr|| > e as measured by the Frobenius norm. 
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Note that, due to the quadratic nature of x^j ^{<^) is independent of e. As will be proved in the next section, the 
divergence rate ^(cr) is guaranteed to lie in a small interval: 



This bound is actually very important: it shows that the prefactor of the linear term of the expectation value of 
is independent of the dimension of the Hilbert space, which is of course crucial for the quantum hypothesis 
testing to make sense and to be scalable. Furthermore, ^(cr) and the corresponding optimal POVM can be calculated 
exactly as the solution of a simple eigenvalue problem; see theorem [5] As discussed later, the optimal POVM turns 
out to be optimal both in the sense of Pitman [s^ and Bahadur jsj] . 

A goodness of fit test protocol for the state a is then given as follows: 

1. Choose the POVM r element {E*} that optimizes ^ as given in definition ©. 

2. Measure {E*} on n independent samples of the state p and record the frequencies Ui of the I'th outcome. 

3. Compute the test statistic — *'"'p^n"'' ' where pi = Tr[£'icr] corresponds to the hypothesis H. 

4. Reject the Hypothesis with error probability a if > x^, where the constant Xa is determined via 



5. If the test statistic is smaller than Xai we state that the observed data is consistent with the hypothesis H 
up to a statistical error a. 

Note that we assumed the large n limit to compute the distribution function for the variable. This assumption 
is generally well satisfied if we take sufficiently many samples. A good estimate is that we have to take to make sure 
that npi > 5 for all i. Furthermore, note that the test a can be made slightly more advanced by also rejecting the 
hypothesis when the fiuctuations are not large enough, i.e. if the value is too small! 

If we now turn to the Definition [5] of the divergence rate, we can give it a meaningful interpretation in the light of 
the test protocol. The goal of the optimization is to construct a test, i.e. a quantum measurement, which rules out 
the hypothesis H with as little samples as possible if is not true. That is, we want that the test statistics grows 
as fast as possible with the number of samples n. In light of Eqn. ([6]), we see that the expectation value of the x^ 
random variable grows linearly in the number of samples n with the prefactor x^ [Pi q) ■ lu the case where p = a and 
thus p = q, i.e. the H is trues, the classical x^ vanishes and we obtain the expectation value r — 1 and a standard 
deviation of ^/2{r — 1). When p ^ cr, the goal is to find the measurement that reaches the critical region indicated 
by Xa ^-s fast as possible, in the worst case. 

We therefore have a class of estimators, parameterized by the different possible POVM's E, and we want to find 
the most efficient one. Associated to every POVM E, there is a worst case state pE with \\pE — cr\\2 > e which gives 
rise to a divergence rate ^e- The expected number of samples n needed to exceed the power a of the test statistic is 
given by the formula 



(27) 




(28) 



(r - 1) + {n 



^y^E ^ Xa 



(29) 



or 




e'^E 



(30) 



This is the number of expected samples which are necessary to reject the hypothesis if it is untrue. 



Now there are several possible notions of efficiencies for asymptotic tests. For the so-called Pitman efficiency 
[33} . we compare tests in such a way that a is fixed but for which e — )■ gradually, and look at the scaling 
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of n as a function of e. Obviously, the POVM that minimizes n is the one for which is maximal, i.e. the 
POVM that corresponds to the optimal one with respect to the definition of ^((t). Note that this POVM is also 
optimal according to Pitman for the maximum likelyhood. Different tests can also be compared with respect 
to the Bahadur efficiency [33|. In the framework of Bahadur, e is fixed, but the error a is made smaller and 
smaller (which corresponds to a larger and larger x^), and the scaling of n with respect to a is compared. The 
optimal POVM which maximizes ^ is obviously also the one with maximal Bahadur efficiency. The optimal quantum 
measurement is therefore the one with maximal Pitman and Bahadur efficiency within the class of all quantum tests. 

Note that the standard deviation of is •\/2(7' — 1). Therefore, — (f — 1) for a fixed a but varying dimension of 
the Hilbert space is proportional to the square root of the number of degrees of freedom, i.e. linear in the dimension 
of the Hilbert space. 



III. DIVERGENCE RATE AND OPTIMAL POVM 



Let us next get some insights into the structure of the optimal POVM measurement. If the state a is full rank, the 
POVM must be informationally complete, so the number of POVM elements has to be at least equal to the square 
of the dimension of the Hilbert space, i.e. r > D^, as otherwise there are always perturbations X around the state a 
for which TT[EiX] = 0. We will now prove that all the elements Ei of the POVM must be pure, which is intuitively 
obvious. Then we will go on proving matching upper and lower bounds to the quantity f (u). The lower bound is 
constructive, and hence gives an explicit construction for the optimal measurement to perform that maximizes the 
discriminating power. 

Lemma 3 If the POVM {Ei} is optimal in the sense that it maximizes the divergence rate, then all its elements can 
be chosen to be pure: E^ = pi \ ip^) {ipi \ . 

Proof: Assume that the first element of the POVM with r elements {Ei} has rank ki > 1, i.e. Ei = I V'O (^'1 ■ 

We will show that we can construct another POVM with r + 1 elements which leads to a larger error rate, and for 
which the rank of Ei is fei — 1 and the rank of Er+i is equal to 1. Then the proof follows by induction. Let us 
therefore define Ei = '^1^2 Pi- I ''i^i) ("^i- I ^^'^ -^r+i = Pi \ V'l) (V'l I- The it is enough to prove the semidefinite inequality 



El) 



\Ei\a 



{El 



< 



El) 



{EiW. 



■{El 



\E, 



r+l) 



[E, 



r+l 



{E, 



r+l 



(31) 



Indeed, if this inequality is true, then the optimization over p will necessarily yield a larger value. Since we are working 
in an effective 2-dimensional subspace spanned by Ei and Er+i, we have to prove that the 2x2 matrix 



where 



91 



92 = 



<72 1 Er+l) {Er+1 \-r(\ Er+1 


(^1 ^ 


- El 


Tr[^i(T]2 






Tr[Sia]Tr[^,+ia](Tr[^i(7]- 


f Tr[£;^ 




Tv[Er+iaY 






Tr[^ia]Tr[^,+ia](Tr[^ia]- 


f Tr[£;,, 




Tr[£;icr]Tr[£;icr] 






Tr[^i(7]Tr[S,,+i(7](Tr[^ia] + 


Tr[S,+ 





(32) 

(33) 
(34) 



Since this is a 2 x 2 matrix, it suffices to compute the trace and the determinant to verify positivity: 



det(M) = (91^2 -r^) {\\Eif\\Er+if ~ \{Ei\Er 



Tr [M] 



Tr[^iCT] El) -Tv[Er+ia]\Er+i)\\l 



Tr[Ei(j]TT[E.r+i(j]{TT[Eia] + Tr[S^+iCT]) 



> 0. 



(35) 
(36) 
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This is obviously a positive rank 1 operator, which finishes the proof. □ 
We are now ready to prove matching lower and upper bounds to ^(cr). 

A. Upper bound to the divergence rate 

An equivalent characterization of the divergence rate ^(cr) can be obtained by introducing the traceless operator 
X = {p-a)/e: 



^{a) = maxmin(X| V \E^)-^{E^\ \X) (37) 



under the conditions 



mi^') = 1 

Tr[XXt] = 1 
Tt[X] = 
X = 

The sum over a is unlimited, i.e. there is no limit on the number of POVM elements, and the dimension of X is the 
dimension of the Hilbert space corresponding to ct, i.e. _D-dimensional. Note that e factored out due to the quadratic 
dependence on p — ct = eX. Without loss of generality, we will work in the basis in which a is diagonal: 



(T = ^ Xa\a){a\ 



with the eigenvalues Aq = (sa)^ ordered in decreasing order. We will also assume that a is full rank; if this condition 
is not satisfied, then we can always perturb a infinitesimally, and take the limit at the end. 
We will prove the following lemma: 

Lemma 4 An upper bound to ^(ct) defined in ^37\ l is given by the smallest nonzero eigenvalue of the matrix 



\a—l 



(a I p. (38) 



with Pg the projector on the suhspace orthogonal to the vector 
A simple upper hound to this upper bound is 



1+Aq 



a 



with A2 the second largest eigenvalue of a. 



Note that this upper bound lies between 2/3 and 1 for any density matrix a. 

Proof: The proof of the theorem is a bit involved. In this proof, we will assume that the elements of the POVM 
are given by piEi with Ei = | |, ( | ) = 1, Y^iPi^i = ^ and pi < 0, Y^iPi = D. 
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As a first step, we observe that a consequence of the fact that a is diagonal, thus we can twirl the POVM elements: 



TT[E,a] = TT[E^D{-9)aD{d)] = 



J dOidO; 



2 • • • 



Here D{9) is a diagonal matrix with elements D^k — exp{i9k)- Therefore, two POVM's related by Ei ~ 
D{9)EiD{—9) will give the same value in the optimization of ([57]) as we can just transform the related X to 
X = D(—9)XD{9). It is therefore clear that an upper bound to ((37)) is obtained by solving the problem 



max mm 



/ d9id92 ■ ■ ■ {X\D{-9) ® D{9) (E. mji^{E.\ ) D{9) ^ 



{Ei} X J d9id9: 



'2 • • • 

as this forces to use the same X for different realizations of all equivalent POVM's related by such a "gauge 
transformation" . This is equivalent to saying that the minimum eigenvalue of a convex combination of operators with 
the same eigenvalues is always larger then the minimum of the individual eigenvalues. This twirling integration can 
be done exactly, and by using the cyclicity of the trace we get 

Y / d9id92 ■ ■ ■ D{9) ® D{-9)\X) {X\D{-9) ® D{9) ^ l^im/fll 
X^ ^0 = XaaXfsp\a)\a){f3\{(3\ + \Xap\ \a}{a\ (g) \/3}{(3\ 

Substituting this into ([57)) . we get 

{E^X\E') 



t[a) < max mm > p, — ; — -; — - 



As = are pure POVM elements, 

(i?»(a| ® = E^^E^p = {E^\a)\a){m\E')- 

Let's now define a new vector |e')with D components that contains the diagonal elements of E^: = E'aon and also 
the vector \s) with D elements given by Sa — and Aq the eigenvalues of a. 

Substituting all this into the previous expressions, we get 

.V- i:o.p{^'W){pw){\Xo.p?-i^~5,,p)+Xo.c.xpp) 

i{a) < max nun ^p, j-^^ 

Note that we have the constraints 

= 1 

i 

a 

a/3 



The biggest problem in doing the optimization of equation p7p is the presence of the denominator. Now is the 
time to get rid of it: we will choose X such that 
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with the vector \t^) with elements {a\t^) = \ta\'^ still to be determined. Note that any choice of X will give us an 
upper bound as long as the constraints above are satisfied. If it is possible to choose such a \t), then the upper bound 
becomes equal to 

^(<7) < 2E^^^(e^l^')Sy =2^|i«l' (39) 

This implies that such X and corresponding t completely eliminates the from the upper bound, which was what 
we were looking for. It is indeed possible to choose such a X: 

\'^afi^ = {Satp — Sfitaf 

The constraints on X can now be written in terms of the new variables ta'- 

= ^ ^ ^a^a 

a 

a/3 

= {Sqtp - Sfjtgf + 2 {Sgtaf 

\ a ajt0 a 



2 + 

\ a a \ a / j 



Note that we made use of the normalization of a in the form of s\ = \ and also of the constraint Sata. = 0. 
Rescaling t"^ by a factor of 2, we get the optimization problem: 

mmimize l^a=i '^a 

under the condition X]a=i ^ata = 



and ELi(1 + 4)4 = 1 



This optimization problem can actually be written as an eigenvalue problem: define = -^/l + sf-^ta and P, the 
projector on the space orthogonal to the vector with components s^/ -^/l + s|. Then the upper bound is given by the 
second smallest eigenvalue (the smallest being 0) of the matrix 

r ~r 
a " 

This is the upper bound that wc set out to prove. A simple upper bound to this upper bound can be foimd. By 
making use of the interlacing properties of eigenvalues of submatrices, we therefore know that the eigenvalues of this 
matrix obey 



Ml = < , 2 < < — 2 — ••• 
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which proves that 



^{a) < < 1. 



This concludes the proof. □ 

B. Lower bound to the divergence rate 

Let us next prove a lower bound to the divergence rate ^{cr). For this, we will have to guess class of good POVM's. 
We will do the optimization over the class of POVM's parameterized by 1 parameter < p < 1: 



l<i<D: E, ={l-p)\i){i\ (41) 
j>D: E, =c{p)\xj){Xj\ (42) 

where the {\i)} label the eigenstates of a. All \xj) are chosen such that they have the same overlap with a: 
{Xj I I Xj) — 1- Those states | Xj) are hence only susceptible to the off-diagonal elements of a. In the case of D a 
prime or a power of prime, a possible choice of such a basis is given by the mutually unbiased basis, but as we only 
require unbiasedness with the standard basis, such a basis can easily be constructed in any dimension, e.g. by choosing 
basis labeled by the angles {Oj}. We will choose such a basis that is invariant under any similarity transformation with 
diagonal elements D^k = exp(i0fe) (which is always possible), such that we have X1j>d ~ "-(p) lljyD I Xj) iXj I = P^- 
This defines c{p) which we do not have to determine explicitely. It follows that 

j>D y 3 \ I j^jj yj^j J 

This follows that the operator in invariant under twirling, and also because 

p.D = Tr ^ Ej = c{p)Tr ^ | Xj) {Xj I = c{p)Tt ^ | Xj) {Xj \^\Xj) (Xj \ ■ 

3 3 3 

With those choices, there is hence only 1 parameter left, i.e. the weight p that weights the diagonal versus the 
off-diagonal parts of the density matrix cr. A lower bound on ^ (cr) can now be obtained by the following optimization: 




maxn^n(J>!:|(l-p)^^|M)(M|+p j ^ | ij) (ij | + ^ | m) (ij | j | ^) (45) 



with X = {p — a)/e a. traceless hermitean operator with norm ||X||2 = 1. We therefore want to make the smallest 
eigenvalue of the matrix Q as large as possible, as this eigenvalue provides a lower bound to ^. The matrix Q is a 
direct sum Qi ® Q2 where Qi is p times the identity matrix on the subspace spanned by | ij) ,i ^ j, and Q2 the Dx D 
matrix 



Q2 = (l-p)E^|a)(a|+P J2 (46) 

a=l a,/3=l 

where we identified | a) = | m). Actually, this is not entirely correct, as we still have to include the constraint that 
Ti[X] = 0. This can easily be incorporated by projecting Q2 on the subspace orthogonal to | fi) = ^/VD"^^ \ a). 
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Given P = 1 - | f7) (f7 |, we therefore define Q2 = PQ2P- 

The smallest eigenvalue of Qi is obviously proportional to p, while the smallest eigenvalue of Q2 is monotonically 
decreasing with p. Therefore, the optimal value of p will be the one for which the smallest eigenvalues of Qi and Q2 
coincide. This is equivalent to determining the largest p for which 

u D 
{l-p)Y^—P\a){a\P+p P\a){(3\P>pP (47) 

a=l a,p=l 

which is in turn equivalent to maximizing p such that 



^^F|a)(a|P>p j^^P|a)(a|F-^P|a)(/3|P 



(48) 



This optimal p, which is the lower bound we were looking for, is then given by 



with /X the largest eigenvalue of the matrix 

5 = l-l^|a)(/3|+^A„|a)(a|-^A„A^|a)(/3| (50) 
which is equivalent to 1 plus the largest eigenvalue of the pseudo- inverse of the matrix Pa~^P: 



^ = ^A„|a)(a|-^A„A;3|a)(/3| (51) 

a a,f3 

S is again the generator of a semi-group, and hence all its eigenvalues are larger or equal to zero. It is equal to 
zero for pure states, and the maximal possible eigenvalue is equal to 1/2 and is obtained for the case Ai = A2 = 1/2, 
Ai>2 = 0. Those 2 cases correspond to ^ = 1 and ^ = 2/3 respectively. It can easily be shown that the pseudo-inverse 
of the matrix S has the same eigenvalues as the matrix (j40p . This means that our lower bound coincides with the 
upper bound! We have therefore proven: 

Theorem 5 The divergence rate ^(cr) is equal to ^(cr) — 1/(1 + /i(S')) with /i(S') the largest eigenvalue of the matrix 

5 = ^A„|a)(a|-^A„A;3|a)(/3| (52) 
where Aq, are the eigenvalues of a and \ a) the corresponding eigenvectors. In particular, this implies that 

1 < a'^) < 1 (53) 

with the value o/2/3 obtained in the case where Ai = A2 = 1/2, Ai>2 — 0, and the value 1 when a is a pure state. 
A possible choice for a POVM that gives the optimal error rate is given as follows: 

l<i<D: E, ={1-C)\i){i\ (54) 
j>D: E, =c{0\x,){xA (55) 

1^^) =^11^'' '\^) (56) 
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with c(^) and the angles {Oj} chosen such that the POVM is informationally complete and that 

Note that the degrees of freedom in the distribution corresponding to this optimal POVM can easily be reduced 
by dividing the POVM up in several resolutions of the identity, and fixing the number of times those different 
measurements are done by a fraction corresponding to their weight given in the theorem. For example, lot us assume 
that the \xj) can be divided up into D orthonormal basis (as e.g. in the case of mutually unbiased bases), and 
that we want to do a total of N measurements. Then the von Neumann measurement in the basis | i) can then be 
done (1 — ^).N times and the other von-Ncumann measurements D.N times. The total degrees of freedom for the 
corresponding distribution is then given by {D + 1).D — [D + 1) = — 1 which is indeed equal to the total number 
of degrees of freedom in the density matrix. It is clear that exactly the same arguments for the error exponent carry 
through in this case. 

C. Examples of divergence rates 

Let us next look at some specific examples. A special role is played by the second largest eigenvalue of a: ^ 
is minimized when A2 is maximal, and maximized when A2 is minimal. The maximal divergence rate is obviously 
obtained for pure states and is exactly given by 1: 

ai^>(V'l) = l (57) 

Furthermore, the states for which it is most difficult to do hypothesis testing are the ones corresponding to projectors 
on a 2-dimensional subspace: 

a = P/2. However, there is clearly not a big discrepancy between 2/3 and 1, so the test will perform well for any 
state a. 

Another interesting class of states contains all maximally mixed states: here 

WD) = ^. (59) 
Finally, ^(cr) can be calculated analytically for any density matrix defined on a 2-level system: 

Following the constructive proof of the lower bound, A POVM with 6 elements that saturates this is given by 

|(l-e(a))|0)(0|,(l-C(a))|l)(l|,^|+)(+|,^|-)(-|,^|i)(i|,^|-i)(-i|l (61) 



where we work in the basis where a is diagonal, and with | ±) and | ±i) the eigenbasis of the Pauli matrices and 
ay. An optimal x^ test with 3 degrees of freedom on N samples is then obtained by doing {1 — ^).N measurements 
in the computational basis and ^.7V/2 in both the and ay basis. 
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IV. DISCUSSION 

We have argued that hypothesis testing provides the natural framework for describing quantum experiments that 
aim at verifying that a certain density matrix is prepared in the lab. This evades the artificial problems of enforcing 
positivity etc. encountered in quantum tomography. 

We have studied the problem of hypothesis testing and goodnciss of fit testing of density matrices, and have 
focused on the test; as shown, most of the results that we derived also directly apply to the logfikelyhood test. 
This provides a clear, simple and flexible framework for testing whether a given density matric is produced by a 
certain experimental setup, and afiows to define confidence intervals that are independent of the particular system 
under consideration. We were also able to characterize divergence rates ^(cr) by doing an optimization over all 
possible POVM measurements maximizing the information, and proved that 2/3 < ^ < 1. This allowed to prove 
that, if we were sampling from a different density matrix p instead of cr, that this would be detected in a number of 
measurements proportional to D / {£^{a)\\p — aW^) with D the dimension of the Hilbert space. Furthermore, we showed 
that this measurement is both optimal from the point of view of Pitman and Bahadur eflaciency. 

Up till now, we assumed that there were no measurement errors. If such errors occur and the error model is known, 
the expected probabilities can easily be adjusted, and exactly the same analysis carries through. 

We have also not yet touched upon the problem of the estimation of the parameters describing the density matrix 
(in principle, the test can also be used to estimate all or a few parameters in the density matrix, and it turns out 
that this estimator is asymptotically optimal; this will be discussed in future work.). The problem of estimation is 
obviously complementary to the topic of hypothesis testing; but independent of which procedure used to estimate 
the density matrix, the procedure should always be complemented by doing hypothesis testing on an independent 
sample set. The big advantage of hypothesis testing versus parameter estimation is the exponential scaling of the 
confidence in the number of samples: if the measurement data is compatible with the expected ones, we accept the 
hypothesis, and otherwise we reject it. Note also that continuous distributions can be tested by the method; this 
can be achieved by binning the data. 

Also, we did not consider the question of entangled measurements on different copies. Just as in the case of 
tomography, this is a bit problematic as the total number of degrees of freedom increases exponentially with the 
number of copies on which joint measurements are done. However, it is possible to circumvent this problem and 
consider POVM measurements with few elements that only reveal full information about the marginals; this will also 
be discussed in future work. 

From a more philosophical point of view, the topic of hypothesis testing forces us to rethink what it means for a 
quantity to be physical and what not. For example, the expectation value of an observable is not observable, but can 
only be sampled. The resulting fluctuations are an entire part of doing an experiment, and if an experiment would 
report frequencies that are too close or too far from the expected ones, then such an experiment can be categorized as 
suspicious. The only thing that is physical are the frequencies by which certain measurement outcomes are obtained, 
and the only goal of quantum mechanics is the prediction of those frequencies. 
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